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Abstract 

This paper presents a method of choosing number of states of a HMM based on number 
of critical points of the motion capture data. The choice of Hidden Markov Models(HMM) 
parameters is crucial for recognizer's performance as it is the first step of the training and 
cannot be corrected automatically within HMM. In this article we define predictor of number 
of states based on number of critical points of the sequence and test its effectiveness against 
sample data. 

1 Introduction 

Hidden Markov Models (HMMs) are presently a popular method for recognition of patterns in data 
sequences, especially time sequences. Their efficiency, however, depends on a number of parameters 
that have to be decided a priori, before a HMM is trained for recognition. 

One of such parameters is the number of states — HMM is constructed with pre-defined number 
of states, which does not change later, during the training process. This number is a factor that 
has an effect both on the constructed HMM detection ratio and also — on HMM's complexity. It 
is important to choose the well performing number of states, both giving sufficient detection rate 
and not generating HMM of too high complexity. 

Most often used method is to try out several number of states and decide the one best performing 
— this is, however, a time consuming task, influencing the time cost of HMM construction. Because 
of that, a method to estimate those parameters prior to constructing HMM, so it gives good results 
would be a desirable addition to the process. 

In this article we consider data from motion capture sensors. For these data, we construct 
predictors of HMM states number based on median value of critical points number in training 
sequences. We then define measure of effectiveness of such predictor based on Akaike's Information 
Criterion and then test the predictors against defined measure. 

This paper is organized as follows: in section [5] we discuss related work concerning both HMM 
use and critical points use in recognition process. Section [3] contains brief introduction to Hidden 
Markov Models. Section 3] describes our approach — both to constructing predictor basis and how 
to test its quality, and Section [5] will present our experiments and results. 
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2 Related work 



Hidden Markov Models (HMMs) as a method of modelling of data sequences have been well devel- 
oped and frequently used. While they were at first most often used in the area of cryptography, 
their application fast widened. They are nowadays a method of choice for speech recognition sys- 
tems (see e.g. [T]) and used in such areas as protein classification and alignment (as in [5]) and 
gesture classification (see [3]). 

Use of multi-dimensional HMMs in gesture recognition is described in [3] , where the classification 
is based upon the input form digital camera. Wilson and Bobick in [5] describe use of parametric 
HMMs as well as online learning for gathering more gesture executions. 

Gesture recognition has been, mainly, approached as a problem of classification of distinct 
gestures captured by digital camera as a sequence of images. There are relatively small number of 
works assuming different way of acquiring data, such as motion capture input. In [B], a database 
is presented, that contains input from motion capture gloves for significant number of executions 
of 22 gestures. The gestures from provided database have been analysed for separability in [7] and 
application of HMMs on such input data has been successfully tested in [3] . In [5] a critical points 
approach is used for classification and proven effective. 

Given predefined priors (such as number of states) there are a number of ways to build HMM. 
Many new ideas have been introduced since traditional Baum- Welch algorithm (described in [3]). 
Some begin construction with a number of states (either high or low) and then either increase 
number of states or decrease it towards predefined prior. Specific algorithms for state-splitting 
[To] and state- merging (such as Bayesian model merging, [Tl]) have been designed (i.e. Gaussian 
splitting- merging algorithm tested in speech recognition systems in [H]). In [T3] Viterbi Path 
Counting algorithm was tested, and has been proven effective next to traditional Baum- Welch 
algorithm of HMM training. 

Number of HMM states is an element of topology to be decided a priori, and then remain 
unchanged during the learning phase. Since priors like the number of states in HMMs are factors 
deciding of its effectiveness, it has been so far approached in several ways. Most popular ways are 
greedy algorithms, trying out several possibilities and choosing the one with best results. However 
this method gives results it tends to be very time consuming. There have been some ways of dealing 
with the problem. Bakis modelling suggests choosing the number of states corresponding to the 
length of input sequence (so every datapoint has corresponding state in HMM), what tends to 
produce HMMs with very large number of sequences. In [l^ authors propose choosing between 
constant number and — alternatively — a number depending of length of feature vector and 
achieved visible effect, in [T] Bakis modelling is improved by its iterative application and more 
sophisticated dependency function. This method, however more effective in state number estimation 
than Bakis modelling, requires a large number of computations. In [4] Bakis model is used for gesture 
recognition to some effect. 

3 Hidden Markov Models 

Our input data (e.g. a recorded gesture) is a sequence of symbols O = O1O2 • ■ - Or, where each 
Oi is a member of the alphabet set Oi € C = {Li, . . . ,Lk}. For motion capture data, Oi could 
correspond to result of Vector Quantization of sensor measurements. 

We model a set of input data (sequences from alphabet C, e.g. several recordings of a defined 
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gesture) with a discrete Hidden Markov Model, defined as 

\^{S,TAC,E), (1) 

where S = {Si, . . . , Sn] is a set of states, T S K"^" a stochastic^ transition matrix, h probabihty 
vector representing the probabihty distribution of starting state and E G R"^'^ a stochastic emission 
matrix. 

Given a HMM A and a sequence O we can compute the log likelihood log(p(0)|A); a logarithm 
of probability that O was generated by A. This can be done e.g. with Forward Algorithm [1]. Given 
a set of HMMs {Ai} we can use the likelihood to determine the most probable HMM associated 
with the sequence from 

argmaxlog(p(0)|Ai). (2) 

i 

If each \i corresponds to a different gesture, we can use this to recognize the sequence as one of 
those gestures. 

A standard way to build a HMM model given a number of states n and a set of reference 
sequences T = {Oi,...,Om} is to use a Baum- Welch algorithm |T], that iteratively maximizes 
the likelihoods of sequences from 7". The number of states n for this approach must be known 
beforehand. 

Performance of HMM depends both on the set of reference sequences T and number of its states 
n. While set of references is usually provided beforehand, number of states must be estimated. 
The choice of n determines both how effectively HMM will recognize sequences and what will be its 
model complexity (number of parameters) . Estimating n is therefore a matter of balancing between 
model complexity and fit to the data 

The standard approach in this case is to use information criteria (such as AIC — Akaike Infor- 
mation Criterion) This approach, however, de facto is a greedy algorithm that needs to build several 
HMMs with different n and find out which of them is the best in context of selected information 
criterion. It creates a need for an algorithm to select n without time consuming testing. 

4 Our Approach 

It is our goal to show an effective method of choosing the number of detecting HMM states for input 
data in form of matrices M S M'"^". Our thesis is that number of HMM states corresponding to 
number of critical points of dataset in question is a viable method of estimating HMM parameters. 
As critical points we will understand points at the end of the sampling interval, local maxima and 
local minima of data sequence. 

4.1 Input 

Let I = {1, ...,/} be a set of class indices, /C = {1, . . . , K} be the set of class example indices and 
let {G^'^}i^i^k<^K be a set of matrices where G"'*^ G M-'x'(*''=) . We wiU understand G*'*^ as the fc-th 
example of matrix from class i. In context of motion capture data it represents single execution of 
a gesture of class i. Since we consider motion capture data, we have to assume that each matrix 
G^'^ has different number of columns, which represent sequences length. We denote this number as 
l{i^ k). For each matrix G*'*^ we will treat it as an ordered set of sequences (matrix' rows) G*'*^, j S J 

^Where as stochastic we understand a matrix Zij g R*^' where Zin = l,Vn = 0, . . . , i and Mij > 0. 
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of length k). Therefore each sequence G''-^ we will be analysing, will be j-th row of matrix G"''^ 
where J7 = {1, . . . , J}. 

4.2 Preprocessing 

With each sequence G^''^ we proceed as follows. First, the sequence G^-^\s polynoniially interpolated 
and resampled to length M. Next step is normalization of G]' = ( G*' J (m-th element of 
j-th row of G*^'^). For a sequence G*'*^, let 



^= M ' 

/X being a mean value of given sequence, and a — its standard deviation. The normalization of 
sequence I G*' J can be represented as the transformation P : R M 

^(g;1) = ^^^. (4) 

In other words, each value G^'J^^ of input sequence is mean shifted and standard deviation normal- 
ized. From this point onward we will understand G^-^ as sequences of normalized values. 

4.3 Computing the number of critical points 

Next, we calculate the critical points — by which we understand end points of the sequence and 
local extrema. We consider a point G^'^^ to be a local maximum, if its value is highest in the 
interval (m — 7, m -I- 7) where 7>1. Ifm — 7<lorTO-|-7> M, we pad first or last value of the 
sequence appropriately. Local minima are calculated in similar way. For a sequence G*' , we denote 

number of local maxima as cp™'''^(G^''^) and number of local minima as cp™™(G*''^). Considering 
the beginning and end of a sequence, total number of critical points equals 

cp(Gf ) = cp™"''(Gf ) + cp'"'"(G;''=) + 2. (5) 

Computed value cp(G^''^) is the base to our predictor. It is our thesis, that predictor close to 
cp(G*''^) gives good results when deciding the number of states of HMM to recognize G''-*' . 

4.4 Test 

To test our thesis, we need to observe, how predictor acts when used in practice. We decided to 
measure quality of our predictor in context of Akaike Information Criterion, which merges both 
effectiveness and computational complexity. To do this we propose the following test. 

4.4.1 Clustering of values 



Having constructed a set of normalized sequences of equal length, we now proceed to clusterization. 
First, k-means clustering method is used to partit: 
into c € C clusters, where C = {q, q -|- 1, . . . , c/i}. 



First, k-means clustering method is used to partition values G^'^, i£l,jGj,kE K,, m = 1, . . . , M 
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Therefore, after this step, each sequence G*''" generates c/i — c; + 1 sequences G*'''"'^ consisting 
of positive integers (index of cluster). 

4.4.2 HMM construction and computation of AIC coefficient for each sequence 

For HMM construction, we take all G*''^''^ and group them by fc — in other words for each sensor 
of each gesture and given number of clusters we take all executions and put them in one set. 
Now, every gesture i and every sensor j, for chosen number of clusters c we group all sequences 
{Gj'''''^}k£K- To each of these sets we assign cp(G*), being the median of set {cp{Gj''')}k£K, which 

is independent of number of clusters c. Then we create pairs Fijc ~ (^{G^j''''^}keiCi cp(G* )^ . 

Pairs Fijc,i G I, .? G J,c eC will be the data used to HMM construction. For each pair Fijc 
we construct [sth — si; + 1) HMMs referred to as X{Fijc, n) where n = sti, . . . , sth is the number of 
states. 

Then, using standard method for each X{Fijc,n), we compute logarithm of probability for each 
sequence G*''^''^ from Fijc- That means, for each G''-^''^ we acquire {sth — sti + 1) logarithms of 
probabihty logp(G*''^''^, A(Fjjc, n)), where n = sti, . . . ,sth- 

To rate those results we consider two factors: logarithm of probability \ogp{G^j''''' , X{Fijc,n)) 
and complexity cost q. The computational complexity of generating result with HMM of n states 
is (n^ + na), since such a HMM operates on transition matrix T E M"^" and emission matrix 

E e R"^*-\ 

There are many criteria that include both of these factors, and in our approach we use Akaike 
Information Criterion. We compute the Akaike information criteria value for a set Fijc and number 
of states 71, as follows: 

K 

AlC{F,jc,n) = -2^1ogp(Gf •^A(F,,e,«)) + 2g, (6) 
fc=i 

where q = ri^ (representing the size of the transition matrix used by HMM). Separate AIC(^ijc, n) 
value is computed for each number of HMM states n. 

4.4.3 Positioning 

We want to observe if the number of critical points cp(G^) can be used to build a good predictor 
of the number of states n of the model with the lowest AIC value (preferable model). To do that, 
for each Fijc and for a given predictor cp we will calculate a value ^{ijc, cp) S (0, 1) which will aid 
us to rate different predictors. 

For given range of n — sti, ... , sth, Fijc and predictor cp we define three values: 

• AIC(Fjjc)min = Tximn=sti,...,sth AIC(Fijc,?i) — the lowest value of AIC given set F^jc] 
. AIC(F,,,) 

max — niax^i— AlC{Fijc,fi) the highest value of AIC given set Fijc', 

• AlC{Fijc)cp ~ AlC{Fijc,cp) — the value of AIC for number of HMM states equal to a our 
predictor cp. 
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We define a measure of similarity of AlC{Fijc)cp to the minimum as 

""P^ = MC(F \ _ATrrF ^ • 

It is easy to see, that ^{ijc, cp) e (0, 1), for given AIC(i^ijc)max, S,{ijc, cp) as AlC{Fijc)cp — > 
AIC(Fijc)min. Also ^{ijc,cp) 1 when AlC{F,jc)cp AlC{Fijc)„,ax- 

Therefore, we conclude that if £^{ijc,cp) is close to zero, the AlC{Fijc)cp is near AIC(Fijc)min, 
which means cp gives us close estimate of number of HMM states, that produces minimal value of 
Akaike Information Criterion. Therefore cp is a good predictor of n in terms of Akaike Information 
Criterion. 



5 Experiments 

The objective of the experiments is to determine if predictor cp based on number of critical points in 
detected sequences G^''^ is a good predictor for HMM number of states in terms of Akaike information 
Criterion. To do that we will calculate values of ^(ijc, cp) for analyzed and aggregate average 
value of 

^^'P^= ^ IJicl,~ck + l) - 
It is easy to see, that the lower ^(cp) e (0, 1) is, the better predictor cp is. 

For experiments we used motion capture glove, which produces finite sequence of 11-dimensional 
real vectors, representing the readings of 10 installed sensors (5 finger bend sensors, 2 accelerometers 
"pitch" and "roll' and 3 accelerometers "OX"', "OY'' and "OZ" recording movement of the hand) and 
time coordinate. While average number of critical points remains similar in all gestures (see Table 
[IJ, it varies when grouping Fijc by sensor (j) — especially accelerometers "OX'V'OY'' and "OZ" 
have very high number of critical points, while finger sensors show significantly smaller averages, 
as we can observe in Table [5] 



Table 1: Average number of critical points by gesture 



Gesture Avg. number of critical points Gesture Avg. number of critical points 



1 


6.4 


11 


6.6 


2 


9.2 


12 


12.9 


3 


8.8 


13 


9.5 


4 


5.9 


14 


10.3 


5 


6.2 


15 


8.5 


6 


6.6 


16 


6.3 


7 


8.9 


17 


7.3 


8 


8.4 


18 


10.0 


9 


6.8 


19 


10.1 


10 


9.4 


20 


6.5 



As the input data we used data collected from motion capture glove, of X = 15 of / = 20 
gestures. Each execution gives us set of J = 10 sequences, one for each of sensors in the glove. 
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Table 2: Average number of critical points by sensor. As we see, accelerometers 
readings (6-10) have significantly more critical points than the finger sensors 



Sensor Avg. number of critical points Sensor Avg. number of critical points 



1 


3.90 


6 


7.25 


2 


4.80 


7 


9.30 


3 


4.35 


8 


15.90 


4 


4.00 


9 


14.50 


5 


3.55 


10 


14.65 



These data are resampled to the length of M = 64. Having discretized sequences of length 64 we 
decided for this experiment, that local extrema will be calculated with 7 = 1. It means, a value is 
consider local maximum or minimum if it is higher or lower — respectively — than its neighbours. 
We then conducted two experiments: 

• In Experiment A we have set cli = 4 and clh = 11, then used generated {Fijc}ii£i,j£j.ceC 
where I = {1, . . . , 20}, J = {1, . . ., 10},C = {4, ... ,11} as single data set. 

• In Experiment B, we have generated 8 different datasets {Fijc\i^ij^j each one with fixed 
c = 4, . . . , 11. This experiment is designed to analyse if the efficiency of our predictor varies 
depending of number of clusters. 

Both in Experiment A and B we have checked three different predictors cp for selecting number 
of states for HMM to detect Fijc- We considered 

• all points computed from the sensor data cp = cp(G*); 

• all points without the boundary points cp = cp(G'* ) — 2; 

• cp = cp(G* ) — 1, which corresponds to the number of trends. 

Also, in both A and B, given different nature of sensor for fingers (1, . . . , 5) and accelerometers 
(6, . . . , 10), we have decided that we will consider also two possible ranges of j 

• = 1, . . . , 10 representing all sensors input. 

• J = 1, . . . , 5 representing only fingers sensors. 

Therefore, both Experiment A and B will be performed with three different predictors and two 
different sensor (j) range. 

The result of each experiment is a value f (cp) G [0, 1], representing how averagely close to the 
AlC{Fijc)min was AIC(Fyc)cp- It is our thesis, that HMM with appropriately set number of states 
will give very low (close to 0) results. 
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Table 3: Results of analysis of all files of ^ for three different 
predictors cp. We can observe significantly better results with 
finger sensors than with accelerometers 

cp = cp(G}) cp = cp(q)-2 cp = cp(G})-l 
(all points) (no boundaries) (trends only) 

All sensors 0.2003 0.f457 0.f723 

Fingers only 0.02f 6 0.0125 O.Of 74 

5.1 Results 

5.1.1 Experiment A 

First we will see the results of analysis of all the sequences. In the Table [3] we can see the average 
positioning ratio ^ of AIC{Fijc, cp). We have considered three different predictors and two different 
ranges of j — all sensors and only finger sensors. 

Additionally, we have computed average ^ of AIC{Fijc, cp) for all gestures and all sensor sepa- 
rately, the results of which we can see in Tables S] and [Sj 



Table 4: 


Average 


value of ^ by 


gesture 


Gesture 


Average 


^ Gesture 


Average ^ 


1 


0.0809 


11 


0.0837 


2 


0.1913 


12 


0.3580 


3 


0.1606 


13 


0.1681 


4 


0.0563 


14 


0.2068 


5 


0.0624 


15 


0.1496 


6 


0.0860 


16 


0.0640 


7 


0.1746 


17 


0.1265 


8 


0.1597 


18 


0.2134 


9 


0.0994 


19 


0.2016 


10 


0.1865 


20 


0.0852 



5.1.2 Experiment B 

Then we have the results in groups divided by number of clusters. What was expected, is that 
larger number of clusters will improve the results (to a certain point). That would indicate that 
perhaps proposed method is merely an exchange — instead of problem of choice the number of 
HMM states we will now face the problem of choosing number of clusters for clusterization. As we 
see in table [6l this did not happen. The value of Akaike Information Criterion does not change 
significantly when we use varied number of clusters c, which suggests that value of the predictor is 
independent of parameter c. 

While the results of HMMs being taught with input sequences of course differ, yet the values 
of AIC coefficients are similar, and expected improvement of results along with rising number of 
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Table 5: Average value of ^ by sensor. 
We can observe the quality of the predic- 
tor deteriorating when applied towards ac- 
celerometers data 



Sensor 


Average ^ 


Sensor 


Average ^ 


1 


0.0089 


6 


0.0496 


2 


0.0170 


7 


0.1118 


3 


0.0164 


8 


0.4675 


4 


0.0108 


9 


0.3818 


5 


0.0094 


10 


0.3840 



clusters did not occur. 

6 Conclusion 

Choosing number of HMM states in dependence of number of critical points in given dataset gives 
very good results when dealing with finger sensors, where computed AIC effectiveness of HMMs with 
number of states equal to cp is averagely in upper 2% of the results. Such high results suggests that 
efficiency-wise proposed method. When we apply this method to all sensors, the efficiency of this 
method drops. Even though it still produces above average results, it is visible, that accelerometers 
and their readings are considerable challenge. What was surprising, changing the number of clusters 
in the clusterization phase did not have significant effect on the method efficiency, which suggests 
that proposed solution is not simple exchange of problem of selecting number of HMM states to a 
problem of deciding number of clusters for clusterization. 
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cp = cp(G}) cp==cp(G})-2 cp = cp(G})-l 
(all points) (no boundaries) (trends only) 



All sensors, c = 4 


0, 


.178 


0.134 


0.156 
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0, 


.015 
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0.024 
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0, 
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0, 


.015 


0.029 
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0, 


.173 
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0.027 
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.173 
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0.154 
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0.029 
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.016 


0.032 


0.029 
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